6 research outputs found

    Time series motifs statistical significance

    Get PDF
    Time series motif discovery is the task of extracting previously unknown recurrent patterns from time series data. It is an important problem within applications that range from finance to health. Many algorithms have been proposed for the task of eficiently finding motifs. Surprisingly, most of these proposals do not focus on how to evaluate the discovered motifs. They are typically evaluated by human experts. This is unfeasible even for moderately sized datasets, since the number of discovered motifs tends to be prohibitively large. Statistical significance tests are widely used in bioinformatics and association rules mining communities to evaluate the extracted patterns. In this work we present an approach to calculate time series motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting.This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif.(undefined

    Multiresolution motif discovery in time series

    Get PDF
    Time series motif discovery is an important problem with applications in a variety of areas that range from telecommunications to medicine. Several algorithms have been proposed to solve the problem. However, these algorithms heavily use expensive random disk accesses or assume the data can't into main memory. They only consider motifs at a single resolution and are not suited to interactivity. In this work, we tackle the motif discovery problem as an approximate Top-K frequent subsequence discovery problem. We fully exploit state of the art iSAX representation multiresolution capability to obtain motifs at diferent resolutions. This property yields interactivity, allowing the user to navigate along the Top-K motifs structure. This permits a deeper understanding of the time series database. Further, we apply the Top-K space saving algorithm to our frequent subsequences approach. A scalable algorithm is obtained that is suitable for data stream like applications where small memory devices such as sensors are used. Our approach is scalable and disk-eficient since it only needs one single pass over the time series database. We provide empirical evidence of the validity of the algorithm in datasets from diferent areas that aim to represent practical applications.(undefined

    Automatically estimating iSAX parameters

    Get PDF
    The Symbolic Aggregate Approximation (iSAX) is widely used in time series data mining. Its popularity arises from the fact that it largely reduces time series size, it is symbolic, allows lower bounding and is space efficient. However, it requires setting two parameters: the symbolic length and alphabet size, which limits the applicability of the technique. The optimal parameter values are highly application dependent. Typically, they are either set to a fixed value or experimentally probed for the best configuration. In this work we propose an approach to automatically estimate iSAX’s parameters. The approach – AutoiSAX – not only discovers the best parameter setting for each time series in the database, but also finds the alphabet size for each iSAX symbol within the same word. It is based on simple and intuitive ideas from time series complexity and statistics. The technique can be smoothly embedded in existing data mining tasks as an efficient sub-routine. We analyze its impact in visualization interpretability, classification accuracy and motif mining. Our contribution aims to make iSAX a more general approach as it evolves towards a parameter-free method

    Time series motif discovery

    Get PDF
    Programa doutoral MAP-i em Computer ScienceTime series data are daily produced in massive proportions in virtually every field. Most of the data are stored in time series databases. To find patterns in the databases is an important problem. These patterns, also known as motifs, provide useful insight to the domain expert and summarize the database. They have been widely used in areas as diverse as finance and medicine. Despite there are many algorithms for the task, they typically do not scale and need to set several parameters. We propose a novel algorithm that runs in linear time, is also space efficient and only needs to set one parameter. It fully exploits the state of the art time series representation (SAX _ Symbolic Aggregate Approximation) technique to extract motifs at several resolutions. This property allows the algorithm to skip expensive distance calculations that are typically employed by other algorithms. We also propose an approach to calculate time series motifs statistical significance. Despite there are many approaches in the literature to find time series motifs e_ciently, surprisingly there is no approach that calculates a motifs statistical significance. Our proposal leverages work from the bioinformatics community by using a symbolic definition of time series motifs to derive each motif's p-value. We estimate the expected frequency of a motif by using Markov Chain models. The p-value is then assessed by comparing the actual frequency to the estimated one using statistical hypothesis tests. Our contribution gives means to the application of a powerful technique - statistical tests - to a time series setting. This provides researchers and practitioners with an important tool to evaluate automatically the degree of relevance of each extracted motif. Finally, we propose an approach to automatically derive the Symbolic Aggregate Approximation (iSAX) time series representation's parameters. This technique is widely used in time series data mining. Its popularity arises from the fact that it is symbolic, reduces the dimensionality of the series, allows lower bounding and is space efficient. However, the need to set the symbolic length and alphabet size parameters limits the applicability of the representation since the best parameter setting is highly application dependent. Typically, these are either set to a fixed value (e.g. 8) or experimentally probed for the best configuration. The technique, referred as AutoiSAX, not only discovers the best parameter setting for each time series in the database but also finds the alphabet size for each iSAX symbol within the same word. It is based on the simple and intuitive ideas of time series complexity and standard deviation. The technique can be smoothly embedded in existing data mining tasks as an efficient sub-routine. We analyse the impact of using AutoiSAX in visualization interpretability, classification accuracy and motif mining results. Our contribution aims to make iSAX a more general approach as it evolves towards a parameter-free method.As sĂ©ries temporais sĂŁo produzidas diariamente em quantidades massivas em diferentes ĂĄreas de trabalho. Estes dados sĂŁo guardados em bases de dados de sĂ©ries temporais. Descobrir padrĂ”es desconhecidos e repetidos em bases de dados de sĂ©ries temporais Ă© um desafio pertinente. Estes padrĂ”es, tambĂ©m conhecidos como motivos, dĂŁo uma nova perspectiva da base de dados, ajudando a explorĂĄ-la e sumarizĂĄ-la. SĂŁo frequentemente utilizados em ĂĄreas tĂŁo diversas como as finanças ou a medicina. Apesar de existirem diversos algoritmos destinados Ă  execução desta tarefa, geralmente nĂŁo apresentam uma boa escalabilidade e exigem a configuração de vĂĄrios parĂąmetros. Propomos, neste trabalho, a criação de um novo algoritmo que executa em tempo linear e que Ă© igualmente eficiente em termos de memĂłria usada, necessitando apenas de um parĂąmetro. Este algoritmo usufrui da melhor tĂ©cnica de representação de sĂ©ries temporais para extrair motivos em vĂĄrias resoluçÔes (SAX). Esta propriedade permite evitar o cĂĄlculo de distĂąncias que tĂȘm um custo computacional muito elevado, cĂĄlculo este geralmente presente noutros algoritmos. Nesta tese tambĂ©m fazemos uma proposta para calcular a significĂąncia estatĂ­stica de motivos em sĂ©ries temporais. Apesar de existirem muitas propostas para a detecção eficiente de motivos em sĂ©ries temporais, surpreendentemente nĂŁo existe nenhuma aproximação para calcular a sua significĂąncia estatĂ­stica. A nossa proposta Ă© enriquecida pelo trabalho da ĂĄrea bioinformĂĄtica, sendo usada uma definição simbĂłlica de motivo para derivar o seu respectivo p-value. Estimamos a frequĂȘncia esperada de um motivo usando modelos de cadeias de Markov. O p-value associado a um teste estatĂ­stico Ă© calculado comparando a frequĂȘncia real com a frequĂȘncia estimada de cada padrĂŁo. A nossa contribuição permite a aplicação de uma tĂ©cnica poderosa, testes estatĂ­sticos, para a ĂĄrea das sĂ©ries temporais. Proporciona assim, aos investigadores e utilizadores, uma ferramenta importante para avaliarem, de forma automĂĄtica, a relevĂąncia de cada motivo extraĂ­do dos seus dados. Por fim, propomos uma metodologia para derivar de forma automĂĄtica os parĂąmetros da representação de sĂ©ries temporais Symbolic Aggregate Approximation (iSAX). Esta tĂ©cnica Ă© vastamente utilizada na ĂĄrea de Extracção de Conhecimento em sĂ©ries temporais. A sua popularidade surge associada ao facto de ser simbĂłlica, de reduzir o tamanho das sĂ©ries, de permitir aproximar a DistĂąncia Euclidiana nas sĂ©ries originais e ser eficiente em termos de espaço. Contudo, a necessidade de definir os parĂąmetros comprimento da representação e tamanho do alfabeto limita a sua utilização na prĂĄtica, uma vez que o parĂąmetro mais adequado estĂĄ dependente da ĂĄrea em causa. Normalmente, estes sĂŁo definidos quer para um valor fixo (por exemplo, 8). A tĂ©cnica, designada por AutoiSAX, nĂŁo sĂł extrai a melhor configuração do parĂąmetro para cada sĂ©rie temporal da base de dados como consegue encontrar a dimensĂŁo do alfabeto para cada sĂ­mbolo iSAX dentro da mesma palavra. Baseia-se em ideias simples e intuitivas como a complexidade das sĂ©ries temporais e no desvio padrĂŁo. A tĂ©cnica pode ser facilmente incorporada como uma sub-rotina eficiente em tarefas existentes de extracção de conhecimento. Analisamos tambĂ©m o impacto da utilização do AutoiSAX na capacidade interpretativa em tarefas de visualização, exactidĂŁo da classificação e na qualidade dos motivos extraĂ­dos. A nossa proposta pretende que a iSAX se consolide como uma abordagem mais geral Ă  medida que se vai constituindo como uma metodologia livre de parĂąmetros.Fundação para a CiĂȘncia e Tecnologia (FCT) - SFRH / BD / 33303 / 200

    Effect of angiotensin-converting enzyme inhibitor and angiotensin receptor blocker initiation on organ support-free days in patients hospitalized with COVID-19

    Get PDF
    IMPORTANCE Overactivation of the renin-angiotensin system (RAS) may contribute to poor clinical outcomes in patients with COVID-19. Objective To determine whether angiotensin-converting enzyme (ACE) inhibitor or angiotensin receptor blocker (ARB) initiation improves outcomes in patients hospitalized for COVID-19. DESIGN, SETTING, AND PARTICIPANTS In an ongoing, adaptive platform randomized clinical trial, 721 critically ill and 58 non–critically ill hospitalized adults were randomized to receive an RAS inhibitor or control between March 16, 2021, and February 25, 2022, at 69 sites in 7 countries (final follow-up on June 1, 2022). INTERVENTIONS Patients were randomized to receive open-label initiation of an ACE inhibitor (n = 257), ARB (n = 248), ARB in combination with DMX-200 (a chemokine receptor-2 inhibitor; n = 10), or no RAS inhibitor (control; n = 264) for up to 10 days. MAIN OUTCOMES AND MEASURES The primary outcome was organ support–free days, a composite of hospital survival and days alive without cardiovascular or respiratory organ support through 21 days. The primary analysis was a bayesian cumulative logistic model. Odds ratios (ORs) greater than 1 represent improved outcomes. RESULTS On February 25, 2022, enrollment was discontinued due to safety concerns. Among 679 critically ill patients with available primary outcome data, the median age was 56 years and 239 participants (35.2%) were women. Median (IQR) organ support–free days among critically ill patients was 10 (–1 to 16) in the ACE inhibitor group (n = 231), 8 (–1 to 17) in the ARB group (n = 217), and 12 (0 to 17) in the control group (n = 231) (median adjusted odds ratios of 0.77 [95% bayesian credible interval, 0.58-1.06] for improvement for ACE inhibitor and 0.76 [95% credible interval, 0.56-1.05] for ARB compared with control). The posterior probabilities that ACE inhibitors and ARBs worsened organ support–free days compared with control were 94.9% and 95.4%, respectively. Hospital survival occurred in 166 of 231 critically ill participants (71.9%) in the ACE inhibitor group, 152 of 217 (70.0%) in the ARB group, and 182 of 231 (78.8%) in the control group (posterior probabilities that ACE inhibitor and ARB worsened hospital survival compared with control were 95.3% and 98.1%, respectively). CONCLUSIONS AND RELEVANCE In this trial, among critically ill adults with COVID-19, initiation of an ACE inhibitor or ARB did not improve, and likely worsened, clinical outcomes. TRIAL REGISTRATION ClinicalTrials.gov Identifier: NCT0273570
    corecore